-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Sixel images in conhost #17421
Conversation
holy shit |
James, I just derailed a team meeting to direct everyone's attention over here :D |
bool _eraseCells(const til::CoordType columnBegin, const til::CoordType columnEnd); | ||
|
||
til::size _cellSize; | ||
std::vector<RGBQUAD> _pixelBuffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I'm using an RGBQUAD
here just because it was convenient for the GDI renderer, and I figured the Atlas renderer might not care what format it's given. But if this turns out to be a problem, it shouldn't be that big a deal to change to something more generic.
okay the bar for the best PR of the year is now very, very high |
src/buffer/out/ImageSlice.cpp
Outdated
auto eraseIterator = _pixelBuffer.begin() + eraseOffset; | ||
for (auto y = 0; y < _cellSize.height; y++) | ||
{ | ||
std::fill_n(eraseIterator, eraseLength, RGBQUAD{}); | ||
std::advance(eraseIterator, _pixelWidth); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The for loop advances the iterator by _cellSize.height * _pixelWidth
in total which is equal to the _pixelBuffer
size. If the eraseOffset
is greater than 0, the final loop iteration will leave the iterator past the end of the _pixelBuffer
. Since the MSVC STL uses checked iterators this results in a debug assertion.
One way to fix this:
auto eraseIterator = _pixelBuffer.begin();
for (auto y = 0; y < _cellSize.height; y++)
{
std::fill_n(eraseIterator + eraseOffset, eraseLength, RGBQUAD{});
std::advance(eraseIterator, _pixelWidth);
}
FYI fill_n
isn't super duper optimal for clearing bytes in this loop and you may be better off using memset
directly. Mostly because it skips an unnecessary emptiness check. Something like this:
auto eraseIterator = _pixelBuffer.data() + eraseOffset;
for (auto y = 0; y < _cellSize.height; y++)
{
memset(eraseIterator, 0, eraseLength * sizeof(RGBQUAD));
eraseIterator += _pixelWidth;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing this up. It never occurred to me that it would be a problem using std::advance
passed the end of the buffer, but it makes perfect sense in retrospect. I think I only starting using it because I was getting complaints from the auditor about pointer arithmetic in some places, so I'll be happy to go back to +=
. If there are still pointer arithmetic warnings I'll find another way to deal with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out +=
isn't any better than std::advance
since it still has the debug asserts. But I found that switching to raw pointers - and continuing to use the std::advance
for incrementing - was enough to keep everyone happy. Debug build doesn't appear to assert anymore, and audit still passes.
Same day delivery: d7b002f After working with this branch for a bit, I think the most important feedback I can give is that I think it'd be preferable if we shared a single
I'm happy to help with any parts of this implementation as needed. 🙂 As an aside, I continue to think that the usage of Reason being, we never parse the same type of DCS simultaneously per VT parser instance, right? So, couldn't we call the parse method of each corresponding implementation directly and keep the state as a struct around forever? This would turn the dynamic into static dispatch. Additionally, it would be excellent for performance if we could pass entire strings of input to each parser (tighter = faster parsing loops). The parser could for instance indicate to the caller whether it's done parsing and at which character offset. |
Works like a dream! It's hard to tell, because I think most of the processing time is spent in the parser, but this does seem faster to me than the GDI renderer.
I expected you'd probably want to rewrite a lot of the buffer management code, and you're welcome to do so, but I'm not sure this particular change makes sense, and I fear you may not understand exactly how sixel graphics are intended to work. Most importantly, the kitty image protocol is fundamentally incompatible with sixel. If you intend to implement the kitty protocol, you'll need to manage its image content separately. It works in a totally different way. The way the hardware graphics terminals work, the screen is essentially one large bitmap, and everything writes to that bitmap. You output some text, that's drawn onto the bitmap. You output some image content, that's drawn on top of it. Because you don't want a garbled mix of text when you write over existing text, it always clears the cell that it's writing to. But that also means that text output over an area of image content, will first punch a hole in that image. This behavior makes it easy for us to emulate the giant bitmap concept with a combination of text that is overlaid with an image only where needed. But it's important to understand that the text and image are representing one and the same thing. When you scroll the text, you're scrolling the image too. If you insert a line in the middle of the screen, the image content below that line is going to move down along with the text. This behavior just works automatically when you have image slices attached to rows. The kitty protocol is completely different, though. Each graphics sequence generates a new image object (potentially at least), and these images are completely separate from the text (and from each other). If you insert a line in the middle of the screen, it'll shift the text down, without necessarily having any affect on images that cover that text. And kitty images can appear both above and below the text, and can also move independently of each other. That means you need to keep track of every kitty sequence that is output, and have some sort of protocol to manage when they're deleted if you're running low on memory. With sixel you don't have this problem, because multiple sixel sequences will just keep writing to the same image buffer. |
I forgot to reply to this. And I definitely agree that the current implementation isn't ideal, but the performance also isn't terrible, so it's not a huge priority for me. I personally care more about the missing VT functionality than I do about speed. For me the competition is 20 year old hardware running at 19200 baud. 😀 But I certainly wouldn't object to anyone else rewriting the DCS handler. I actually thought it might come up as part of #17336. |
This is definitely true. In particular I'm not sure I really understand yet how the "cell size" for sixels work. After my initial, cursory reading of the PR my understanding of
I wasn't aware it worked like that! I guess we could slice up the image whenever we encounter a VT/Console scroll sequence/command - we do the same for (BTW thanks for explaining the differences with the kitty protocol!) More importantly, it'd be great if we could still add an "ID" to the One option would be to just have a With this ID we can then key the |
The cell size is necessary to emulate the behavior of a real sixel terminal on which the character cell is a specific size, typically 10x20 pixels. Software intended to run on those terminals would often have predefined sixel images which would be expected to occupy a fixed area of the screen. For example, an image that is 100x200 pixels would be expected to occupy exactly 10x10 character cells. Linux terminals can't handle that, and as a result are mostly useless as terminal emulators. And to cope with terminals like that, modern sixel software is forced to query the terminal cell size and regenerate images on the fly to match whatever font the user happens to have selected at the time. No other software works this way, but for some reason people seem to think this is a good idea for terminals. So my expectation was that at some point, someone would request we do the same thing in Windows Terminal, and we might need an option to match the behavior of Linux terminals (i.e. have the cell size exactly match the active font). Although once initialized at a given size, it should remain at that size for the duration of the session if you expect it to behave sensibly (it could potentially readjust after an RIS, though, or possibly even when clearing the screen).
The sixel protcol itself doesn't necessarily produce rectangular output, but in most cases it probably would be a fixed width, and the way I've currently implemented it we typically reserve an area of the buffer that is as wide as the widest row for a given sequence (there are some edge cases where that isn't necessarily true though). But the problem with thinking of sixel as a simple image format is that it's often not used that way. Imagine for example that you're creating a painting application that allows the user to scribble all over the screen with their mouse. The way you'd implement that with sixel is that almost every pixel plot would be a new sixel sequence. For a standard 80x24 screen, that's going to use up 24 image slices at most. But if you're storing each sixel sequence as a separate image object, that's potentially 384000 images! And that's assuming your scribbling only touched each pixel once. In practice it'll likely be much worse than that. Any terminal with that architecture is either going to rapidly die, or drop content. It's just not practical.
That's definitely an option. I actually tried that at one point in the GDI renderer, because I thought fewer GDI calls might be faster, but it didn't seem to make any difference for me. If anything I think the overhead of combining the slices might have made it slightly slower, so in the end I just stuck with the simpler solution. But you may still find it's worthwhile in the Atlas renderer.
Once this is merged (assuming it does get merged), you're welcome to make whatever changes you think are necessary. And if you still think you can make it work with larger images spanning rows, that's fine too. I just wanted to make sure you were aware of the potential challenges of that approach. |
@j4james is, as usual, completely correct, but I believe he may be speaking a little tongue-in-cheek. Click here for some boring historyBack in the 1980s, full-screen software programs had to know each sixel hardware terminal's specific character cell size and overall dimensions. I think even back then people knew it was a bad idea to do it that way; both the ReGIS and Tektronix vector graphics protocols, which were contemporaneous, use virtual grids relative to the screen size so images can appear the same on any terminal. Sixel was never given that ability and so software that wanted to align graphics and text nicely had to be aware of the terminal type. Modern terminal software allows apps to work the same way but instead of a look-up table based on
Probably a dumb question, but couldn't you just clear everything when the size changes and send |
|
These are some examples I've been using for testing which demonstrate a few things you can do with sixel besides just blitting an image onto the screen. Paging animation
This example includes music to control the timing of the frames, so you can just Horse.mp4Color cycling (plasma)
Run with python: plasma.py Sixel.Plasma.mp4Color cycling (waterfall)
Waterfall.mp4Scrolling margins
Run with python: shuffle.py Sixel.Shuffle.mp4The observant among you may have noticed that some of these examples are running in Windows Terminals. This is because I was testing a merge of Leonard's passthrough branch and Atlas renderer. It's still a bit crashy at times, but it works! |
@j4james Okay, that was fantastic, particularly the Sixel Shuffle. Has any other terminal implemented rectangular copy capabilities yet? I love that you are interpreting the sixel colors immediately to allow for palette shifts. Are you going to allow for a shared palette between images, similar to |
It would be IMPOSSIBLE for me to overstate how much I love this. |
@hackerb9 In regards to the shuffle test specifically, MLTerm and RLogin are the only terminals I'm aware of that have all the required functionality, because that needs paging, rectangular copy, and horizontal margins. I'm not sure there are any other terminals that can do all of those things as well as sixel. But if we're just considering rectangular copy, then I believe Contour supports that too, and there may well be others.
Yes, the palette is always shared between images, at least in the sense that it's inherited. Palette changes in the second image won't affect the first though, so it's not a global palette like you have on a hardware terminal. But that's still something I'm hoping we might support one day. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 34/34 - I would say I understand roughly 85% of the Sixel parser, and I think the image slice design is fine--readable and straightforward.
I'm provisionally signing off given that this is a draft (I'll review incremental changes) despite the capture question.
That file doesn't actually have transparency! That's why it doesn't work in XTerm.
I'm glad you brought this up, because it is worth considering, but it's probably best discussed in a followup issue if/when we have sixel in Windows Terminal itself, because conhost doesn't support hyperlinks. Although I'll say now that I personally don't think hyperlinks should be associated with the sixel output. If you want to attach a link to the image, you can just apply it to the text area behind (easily doable with a |
Most of the important compatibility tests are in hackerb9's vt340test repo. My contributions are in the j4james directory. In an ideal world I would have liked to convert those into units tests of some kind, but I have no idea how that could be made to work. |
@j4james My understanding is sixels work pretty much like a dot-matrix printer with 6 needles and several colors of ink ribbons. The pixels shown in magenta in this picture are not set in any color in the sixel: In mlterm, it properly shows as leaving those pixels as background color: This technique of not outputting a pixel in any of the colors to keep the original background color works fine in mlterm, and is what I used in the test image. Are you saying the specs and real hardware do not handle that as transparency? and then, how do they achieve transparency if at all possible? |
@PhMajerus Your understanding is correct as far as it goes. What you're missing is that sixel terminals had a feature called "background select", which by default filled the background area before plotting any pixels. In that case, when you don't output a pixel, it's just going to end up with the background fill color, so won't be transparent. On a real terminal, the background fill color comes from entry 0 in the color table, and by default that should be black unless it's redefined. But it look to me like Mlterm is possibly filling with the active text background color, which gives you the impression that it's transparent, but it's really not, and it's technically incorrect. Xterm, as far as I can recall, will fill the background with a shade of gray, because your image defines color number 0 as
Parameter 2 of the DCS sequence specifies whether "background select" should be applied or not. If it's 0 or 2, the background is filled (the default behavior). If it's 1, the background won't be filled, and anywhere that you haven't plotted a pixel should be transparent (as you originally expected). So if you want your image to be transparent, it should start with something like And the best way to test transparency is to output your image over some actual text content, and make sure you can see the underlying text showing through. Most Linux terminals are unlikely to support it though. Last I tested I think Xterm was the only one that did. But that was a while back, so it's possible some of the others have improved since then. |
I think you're right:
You are absolutely right, I overlooked the P2 parameter in the VT330/340 reference manual. Changing P2 to 1 properly makes the background transparent in all terminals, so it works in mlterm, xterm, and conhost canary. So here is the corrected file: I still noticed a difference between conhost canary and mlterm and xterm. In your implementation, the next line of text overwrites the last line of the sixel image, while both mlterm and xterm are doing a newline before printing the next I suspect that is part of the problem you mentioned with sixels overlapping text. |
This PR add supports for two query sequences that are used to determine the pixel size of a character cell: * `CSI 16 t` reports the pixel size of a character cell directly. * `CSI 14 t` reports the pixel size of the text area, and when divided by the character size of the text area, you can get the character cell size indirectly (this method predates the introduction of `CSI 16 t`). These queries are used by Sixel applications that want to fit an image within specific text boundaries, so need to know how many cells would be covered by a particular pixel size, or vice versa. Our implementation of Sixel uses a virtual cell size that is always 10x20 (in order to emulate the VT340 more accurately), so these queries shouldn't really be needed, but some applications will fail to work without them. ## References and Relevant Issues Sixel support was added to conhost in PR #17421. ## Validation Steps Performed I've added some unit tests to verify that these queries are producing the expected responses, and I've manually tested on [XtermDOOM] (which uses `CSI 16 t`), and the [Notcurses] library (which uses `CSI 14 t`). [XtermDOOM]: https://gitlab.com/AutumnMeowMeow/xtermdoom [Notcurses]: https://github.com/dankamongmen/notcurses ## PR Checklist - [x] Tests added/passed
@PhMajerus The final text cursor position is meant to end up on top of the image, otherwise you wouldn't be able to output images on the bottom row of the display. If you want it below the image you need to add a linefeed yourself. Xterm and mlterm both use non-standard algorithms for positioning the cursor, but they don't actually match each other. The fact that they both ended up with the text below the image in your example is just a coincidence. In some cases the text will overlap the image and in other cases it won't. It depend on image size and font size. |
Thanks again @j4james, both for the time explaining how it is supposed to work and for implementing it. I can't wait for it to work in wt as well!
|
When we have a series of image slices of differing widths, which also don't align with the cell boundaries, we can get rounding errors in the scaling which makes the different slices appear misaligned. This PR fixes the issue by removing the 4 pixel width alignment that was enforced in the `ImageSlice` class, since that's not actually necessary when the pixels themselves are already 4 bytes in size. And without that, the widths should be correctly aligned with the cell boundaries. ## References and Relevant Issues The initial Sixel implementation was added in PR #17421. ## Validation Steps Performed I've confirmed that this fixes the rendering glitches reported in #17711, and all my existing Sixel tests still work as expected. Closes #17711
Any plan to make a new release with the great feature? Thank you very much. |
🔜 |
Summary of the Pull Request
This PR introduces basic support for the Sixel graphics protocol in
conhost, limited to the GDI renderer.
References and Relevant Issues
This is a first step towards supporting Sixel graphics in Windows
Terminal (#448), but that will first require us to have some form of
ConPTY passthrough (#1173).
Detailed Description of the Pull Request / Additional comments
There are three main parts to the architecture:
SixelParser
class takes care of parsing the incoming SixelDCS
sequence.
of
ImageSlice
objects, which represent per-row image content.affected row.
The parser is designed to support multiple conformance levels so we can
one day provide strict compatibility with the original DEC hardware. But
for now the default behavior is intended to work with more modern Sixel
applications. This is essentially the equivalent of a VT340 with 256
colors, so it should still work reasonably well as a VT340 emulator too.
Validation Steps Performed
Thanks to the work of @hackerb9, who has done extensive testing on a
real VT340, we now have a fairly good understanding of how the original
Sixel hardware terminals worked, and I've tried to make sure that our
implementation matches that behavior as closely as possible.
I've also done some testing with modern Sixel libraries like notcurses
and jexer, but those typically rely on the terminal implementing certain
proprietary Xterm query sequences which I haven't included in this PR.